15 research outputs found
Adaptation Speed Analysis for Fairness-aware Causal Models
For example, in machine translation tasks, to achieve bidirectional
translation between two languages, the source corpus is often used as the
target corpus, which involves the training of two models with opposite
directions. The question of which one can adapt most quickly to a domain shift
is of significant importance in many fields. Specifically, consider an original
distribution p that changes due to an unknown intervention, resulting in a
modified distribution p*. In aligning p with p*, several factors can affect the
adaptation rate, including the causal dependencies between variables in p. In
real-life scenarios, however, we have to consider the fairness of the training
process, and it is particularly crucial to involve a sensitive variable (bias)
present between a cause and an effect variable. To explore this scenario, we
examine a simple structural causal model (SCM) with a cause-bias-effect
structure, where variable A acts as a sensitive variable between cause (X) and
effect (Y). The two models, respectively, exhibit consistent and contrary
cause-effect directions in the cause-bias-effect SCM. After conducting unknown
interventions on variables within the SCM, we can simulate some kinds of domain
shifts for analysis. We then compare the adaptation speeds of two models across
four shift scenarios. Additionally, we prove the connection between the
adaptation speeds of the two models across all interventions.Comment: CIKM 202
Robust Semi-Supervised Learning with Out of Distribution Data
Recent Semi-supervised learning (SSL) works show significant improvement in
SSL algorithms' performance using better-unlabeled data representations.
However, recent work [Oliver et al., 2018] shows that the SSL algorithm's
performance could degrade when the unlabeled set has out-of-distribution
examples (OODs). In this work, we first study the critical causes of OOD's
negative impact on SSL algorithms. We found that (1) the OOD's effect on the
SSL algorithm's performance increases as its distance to the decision boundary
decreases, and (2) Batch Normalization (BN), a popular module, could degrade
the performance instead of improving the performance when the unlabeled set
contains OODs. To address the above causes, we proposed a novel unified-robust
SSL approach that can be easily extended to many existing SSL algorithms, and
improve their robustness against OODs. In particular, we propose a simple
modification of batch normalization, called weighted batch normalization, that
improves BN's robustness against OODs. We also developed two efficient
hyper-parameter optimization algorithms that have different tradeoffs in
computational efficiency and accuracy. Extensive experiments on synthetic and
real-world datasets prove that our proposed approaches significantly improves
the robustness of four representative SSL algorithms against OODs compared with
four state-of-the-art robust SSL approaches.Comment: Preprin
Pursuing Counterfactual Fairness via Sequential Autoencoder Across Domains
Recognizing the prevalence of domain shift as a common challenge in machine
learning, various domain generalization (DG) techniques have been developed to
enhance the performance of machine learning systems when dealing with
out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data
distributions can gradually change across a sequence of sequential domains.
While current methodologies primarily focus on improving model effectiveness
within these new domains, they often overlook fairness issues throughout the
learning process. In response, we introduce an innovative framework called
Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder
(CDSAE). This approach effectively separates environmental information and
sensitive attributes from the embedded representation of classification
features. This concurrent separation not only greatly improves model
generalization across diverse and unfamiliar domains but also effectively
addresses challenges related to unfair classification. Our strategy is rooted
in the principles of causal inference to tackle these dual issues. To examine
the intricate relationship between semantic information, sensitive attributes,
and environmental cues, we systematically categorize exogenous uncertainty
factors into four latent variables: 1) semantic information influenced by
sensitive attributes, 2) semantic information unaffected by sensitive
attributes, 3) environmental cues influenced by sensitive attributes, and 4)
environmental cues unaffected by sensitive attributes. By incorporating
fairness regularization, we exclusively employ semantic information for
classification purposes. Empirical validation on synthetic and real-world
datasets substantiates the effectiveness of our approach, demonstrating
improved accuracy levels while ensuring the preservation of fairness in the
evolving landscape of continuous domains
Multidimensional Uncertainty-Aware Evidential Neural Networks
Traditional deep neural networks (NNs) have significantly contributed to the
state-of-the-art performance in the task of classification under various
application domains. However, NNs have not considered inherent uncertainty in
data associated with the class probabilities where misclassification under
uncertainty may easily introduce high risk in decision making in real-world
contexts (e.g., misclassification of objects in roads leads to serious
accidents). Unlike Bayesian NN that indirectly infer uncertainty through weight
uncertainties, evidential NNs (ENNs) have been recently proposed to explicitly
model the uncertainty of class probabilities and use them for classification
tasks. An ENN offers the formulation of the predictions of NNs as subjective
opinions and learns the function by collecting an amount of evidence that can
form the subjective opinions by a deterministic NN from data. However, the ENN
is trained as a black box without explicitly considering inherent uncertainty
in data with their different root causes, such as vacuity (i.e., uncertainty
due to a lack of evidence) or dissonance (i.e., uncertainty due to conflicting
evidence). By considering the multidimensional uncertainty, we proposed a novel
uncertainty-aware evidential NN called WGAN-ENN (WENN) for solving an
out-of-distribution (OOD) detection problem. We took a hybrid approach that
combines Wasserstein Generative Adversarial Network (WGAN) with ENNs to jointly
train a model with prior knowledge of a certain class, which has high vacuity
for OOD samples. Via extensive empirical experiments based on both synthetic
and real-world datasets, we demonstrated that the estimation of uncertainty by
WENN can significantly help distinguish OOD samples from boundary samples. WENN
outperformed in OOD detection when compared with other competitive
counterparts.Comment: AAAI 202
Dynamic Prompting: A Unified Framework for Prompt Tuning
It has been demonstrated that the art of prompt tuning is highly effective in
efficiently extracting knowledge from pretrained foundation models,
encompassing pretrained language models (PLMs), vision pretrained models, and
vision-language (V-L) models. However, the efficacy of employing fixed soft
prompts with a predetermined position for concatenation with inputs for all
instances, irrespective of their inherent disparities, remains uncertain.
Variables such as the position, length, and representations of prompts across
diverse instances and tasks can substantially influence the performance of
prompt tuning. In this context, we provide a theoretical analysis, which
reveals that optimizing the position of the prompt to encompass the input can
capture additional semantic information that traditional prefix or postfix
prompt tuning methods fail to capture. Building upon our analysis, we present a
unified dynamic prompt (DP) tuning strategy that dynamically determines
different factors of prompts based on specific tasks and instances. To
accomplish this, we employ a lightweight learning network with Gumble-Softmax,
allowing us to learn instance-dependent guidance. Experimental results
underscore the significant performance improvement achieved by dynamic prompt
tuning across a wide range of tasks, including NLP tasks, vision recognition
tasks, and vision-language tasks. Furthermore, we establish the universal
applicability of our approach under full-data, few-shot, and multitask
scenarios. Codes are available at https://github.com/Xianjun-Yang/DPT.Comment: updat
Open-ended Commonsense Reasoning with Unrestricted Answer Scope
Open-ended Commonsense Reasoning is defined as solving a commonsense question
without providing 1) a short list of answer candidates and 2) a pre-defined
answer scope. Conventional ways of formulating the commonsense question into a
question-answering form or utilizing external knowledge to learn
retrieval-based methods are less applicable in the open-ended setting due to an
inherent challenge. Without pre-defining an answer scope or a few candidates,
open-ended commonsense reasoning entails predicting answers by searching over
an extremely large searching space. Moreover, most questions require implicit
multi-hop reasoning, which presents even more challenges to our problem. In
this work, we leverage pre-trained language models to iteratively retrieve
reasoning paths on the external knowledge base, which does not require
task-specific supervision. The reasoning paths can help to identify the most
precise answer to the commonsense question. We conduct experiments on two
commonsense benchmark datasets. Compared to other approaches, our proposed
method achieves better performance both quantitatively and qualitatively.Comment: Findings of EMNLP 202
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Large language models (LLMs) have significantly advanced the field of natural
language processing (NLP), providing a highly useful, task-agnostic foundation
for a wide range of applications. However, directly applying LLMs to solve
sophisticated problems in specific domains meets many hurdles, caused by the
heterogeneity of domain data, the sophistication of domain knowledge, the
uniqueness of domain objectives, and the diversity of the constraints (e.g.,
various social norms, cultural conformity, religious beliefs, and ethical
standards in the domain applications). Domain specification techniques are key
to make large language models disruptive in many applications. Specifically, to
solve these hurdles, there has been a notable increase in research and
practices conducted in recent years on the domain specialization of LLMs. This
emerging field of study, with its substantial potential for impact,
necessitates a comprehensive and systematic review to better summarize and
guide ongoing work in this area. In this article, we present a comprehensive
survey on domain specification techniques for large language models, an
emerging direction critical for large language model applications. First, we
propose a systematic taxonomy that categorizes the LLM domain-specialization
techniques based on the accessibility to LLMs and summarizes the framework for
all the subcategories as well as their relations and differences to each other.
Second, we present an extensive taxonomy of critical application domains that
can benefit dramatically from specialized LLMs, discussing their practical
significance and open challenges. Last, we offer our insights into the current
research status and future trends in this area